Introduction

Hearthstone is a popular collectible card game published by Blizzard Entertainment in 2014, which is based on the Warcraft series by the same company. The goal of the game is to build a deck of 30 cards and defeat the opponent who also has a deck of 30 cards.

In Hearthstone, cards can be classified according to the following categories:

  • Class: Neutral cards can be used by all nine classes, while Class cards can only be used by the indicated class.
  • Rarity: How often cards can be found when opening card packs
    • Cards are indicated as Free, Common, Rare, Epic and Legendary, ordered by increasing rarity.
  • Type: Different types of cards have different effects in the game:
    • Minions are played on the game board and can attack Heroes or other minions.
    • Spells are Class abilities that generate a variety of effects on the board.
    • Weapons are items that Heroes can equip to attack Heroes or other minions.
    • Heroes represent the player; the player loses when the Hero’s health reaches 0.
  • Set: Cards are released in designated sets, which are usually based on a certain theme in the Warcraft universe.
    • The core game consists of two sets, Basic and Classic, which are always available in the game.
    • Expansion sets add newer cards to the game, and are released at regular intervals

Approaching the question

Which are the most popular cards used in Ranked decks?

Which are the most popular cards used in Ranked decks?

We focus on the Ranked format where players get to decide which cards to include in their deck, therefore the cards’ popularity are more accurately represented, and the gameplay is not subject to additional constraints that other game modes (like Tavern Brawls and Adventures) may impose.

How do we determine popularity?

We determine popularity by the number of decks that include at least 1 copy in the starting 30 cards (not generated by other effects).

A deck can include at most 2 copies of any card (1 for Legendary cards), thus a card’s popularity is not heavily influenced by the number of copies players wish to use.

Possible biases to consider

Since Neutral cards can be used by multiple classes, they should be more popular than Class-specific cards.

For the Wild format, cards from the older sets may be more popular simply because they have been in the game longer.

For the Standard format, cards from the Basic and Classic sets will be more popular because they do not rotate out of the format unlike expansion cards.

Why address such a question?

If a certain card becomes too popular (i.e. the community thinks players must include it in their decks), it reduces the card variety in the metagame and makes gameplay frustrating for other players (amongst other consequences). In the long term, this may lead to player attrition and loss of potential revenue (when players purchase card packs or other cosmetics).

Historically, Blizzard has dealt with problematic cards in one of several ways:

Exploring the Data

We will use three datasets in this analysis:

  • data.csv contains a list of decks submitted by players to HearthPwn from 2013 (pre-launch) to 2017.
  • refs.json contains detailed information about all cards (collectible and non-collectible) up to March 2017.
  • cards_collectible.json contains detailed information about the cards that are collectible in the game (up to August 2018)

The Decks data

Here is a preview of the data’s first few rows and columns:

craft_cost date deck_archetype deck_class deck_format deck_id deck_set deck_type rating title user
9740 2016-02-19 Unknown Priest W 433004 Explorers Tavern Brawl 1 Reno Priest FunKaliTy
9840 2016-02-19 Unknown Warrior W 433003 Explorers Ranked Deck 1 RoosterWarrior RooosterRooo
2600 2016-02-19 Unknown Mage W 433002 Explorers Theorycraft 1 Annoying Messalm
15600 2016-02-19 Unknown Warrior W 433001 Explorers None 0 Standart pay to win warrior KingSneak
7700 2016-02-19 Unknown Paladin W 432997 Explorers Ranked Deck 1 Palamix kowdog_1507
5740 2016-02-19 Unknown Warrior W 432995 Explorers Ranked Deck 2 Kolento’s Elise Control Warrior Kolento
1800 2016-02-19 Unknown Warrior W 432998 Explorers Arena 1 Arena Simulation #484 regniwO
1800 2016-02-19 Unknown Warlock W 432993 Explorers Arena 1 Arena Simulation #483 regniwO
8780 2016-02-19 Unknown Priest W 432992 Explorers Ranked Deck 1 Djinn Dragon Priest (S23) Scilex96
4080 2016-02-19 Unknown Warlock W 432991 Explorers Tavern Brawl 1 Battlecry Zoo-Malygos Combo BuzyBRx

This dataset has 346232 rows and 41 columns. The columns craft_cost to user describe the deck’s attributes (like date submitted, class, deck format) while the columns card_0 to card_29 describe the cards using their card IDs. Detailed information on the variables can be found on the Kaggle: History of Hearthstone.

##  [1] "craft_cost"     "date"           "deck_archetype" "deck_class"    
##  [5] "deck_format"    "deck_id"        "deck_set"       "deck_type"     
##  [9] "rating"         "title"          "user"           "card_0"        
## [13] "card_1"         "card_2"         "card_3"         "card_4"        
## [17] "card_5"         "card_6"         "card_7"         "card_8"        
## [21] "card_9"         "card_10"        "card_11"        "card_12"       
## [25] "card_13"        "card_14"        "card_15"        "card_16"       
## [29] "card_17"        "card_18"        "card_19"        "card_20"       
## [33] "card_21"        "card_22"        "card_23"        "card_24"       
## [37] "card_25"        "card_26"        "card_27"        "card_28"       
## [41] "card_29"

There are 8 rows that contain missing data. All the missing values are in the title column, so they can be safely ignored.

##         row col
## [1,]  16747  10
## [2,] 175608  10
## [3,] 216047  10
## [4,] 238021  10
## [5,] 278491  10
## [6,] 326192  10
## [7,] 329285  10
## [8,] 329286  10
craft_cost date deck_archetype deck_class deck_format deck_id deck_set deck_type rating title user
3580 2016-06-22 Unknown Hunter S 576543 Old Gods Theorycraft 1 NA Laurentiuspullo
2640 2014-07-20 Unknown Rogue S 74841 Live Patch 5506 Ranked Deck 0 NA goodsound
2180 2013-11-19 Unknown Priest S 17994 Beta Patch 3937 Arena 1 NA zevsmkgad
140 2013-11-03 Unknown Hunter S 15525 Beta Patch 3937 None 1 NA roldan2003bis
5660 2015-08-30 Unknown Paladin W 318748 TGT Launch None 1 NA Reiken777
2940 2015-12-23 Unknown Shaman W 400510 Explorers None 1 NA Lex456
2500 2015-12-20 Unknown Shaman W 399274 Explorers None 1 NA KingBeda
2500 2015-12-20 Unknown Shaman W 399273 Explorers None 1 NA KingBeda

Cards data

Additional information on the variables can be found on HearthstoneJSON.

There are two identifier fields for the cards: a character/string id and an integer dbfId. The decks_raw dataset uses the integer IDs to reference cards used.

## [1] 1751   65

The following computes the number of missing values in each field, with the exception of those that are present as lists or data frames (mechanics, referencedTags, classes, entourage)

Pre-processing

Decks data

The raw dataset is split into two, one containing the deck attributes and the other containing the deck composition (cards), with deck_id acting as the unique identifier. We also exclude decks created before launch (there were many card changes in the alpha and beta stages, making card popularity very volatile).

The decks_comp data will be pivoted to long format later on, thus excluding fields that are not related to the cards will minimize the size of the dataset.

Within decks_attr, the factor/enumerated columns are identified and recast accordingly.

Years and Months

While each deck has a submission date, we may also be interested in grouping the decks by month (which corresponds to Ranked seasons) and by year (which is marked by expansion release dates instead of calendar dates).

Years in the game based on a time period that:

  • Starts with the release of the first card set of each year, which usually falls around April.
  • Ends with the release of the first card set the next year (non-inclusive).

So based on the release dates, the years would be:

  • 2014-03-11 to 2015-04-01 (Live, Naxxramas, Goblin vs Gnomes)
  • 2015-04-02 to 2016-04-25 (Blackrock, Grand Tournament, League of Explorers)
  • 2016-04-26 to 2017-03-19 (Old Gods, Karazhan, Gadgetzan)

Deck Format

The Standard and Wild formats were formally introduced into the game on 2016-04-26 with the release of Whispers of the Old Gods; however the graphic below shows that many decks from June 2014 to April 2016 were marked as Wild.

We can simply relabel all decks created before 2016-04-26 as Standard since all cards before then are not separated by format:

A summary of the processed data is shown below:

##     deck_id         craft_cost         date           
##  Min.   : 36923   Min.   :    0   Min.   :2014-03-11  
##  1st Qu.:253573   1st Qu.: 2840   1st Qu.:2015-05-26  
##  Median :428597   Median : 5120   Median :2016-02-09  
##  Mean   :419989   Mean   : 5745   Mean   :2015-12-21  
##  3rd Qu.:603508   3rd Qu.: 7840   3rd Qu.:2016-08-09  
##  Max.   :749548   Max.   :48000   Max.   :2017-03-19  
##                                                       
##          deck_archetype     deck_class    deck_format
##  Unknown        :220501   Mage   :42230   S:307743   
##  Midrange Shaman:  5472   Priest :41756   W: 16361   
##  Control Priest :  5135   Paladin:39368              
##  Control Warrior:  4939   Warlock:35598              
##  Tempo Mage     :  4545   Druid  :35488              
##  Midrange Hunter:  4371   Shaman :33969              
##  (Other)        : 79141   (Other):95695              
##              deck_set              deck_type          rating        
##  Explorers       : 57307   Arena        :  8178   Min.   :   0.000  
##  Old Gods        : 49895   None         : 75120   1st Qu.:   1.000  
##  Blackrock Launch: 38900   PvE Adventure:  9059   Median :   1.000  
##  Gadgetzan       : 31329   Ranked Deck  :202104   Mean   :   2.777  
##  Naxx Launch     : 22283   Tavern Brawl :  6360   3rd Qu.:   1.000  
##  Yogg Nerf       : 22175   Theorycraft  : 19686   Max.   :4016.000  
##  (Other)         :102215   Tournament   :  3597                     
##     title               user              hsmonth      hsyear      
##  Length:324104      Length:324104      Min.   :2014   2014: 65119  
##  Class :character   Class :character   1st Qu.:2015   2015:128062  
##  Mode  :character   Mode  :character   Median :2016   2016:130923  
##                                        Mean   :2016                
##                                        3rd Qu.:2017                
##                                        Max.   :2017                
## 

Cards data

Many columns in the cards_raw data pertain to the card’s stats, mechanics and play requirements, which is better explained by the text on the card image. So we choose to only include the columns that we consider are core properties of the card (and not sufficiently explained by the card text):

## Observations: 1,751
## Variables: 9
## $ dbfId       <int> 2539, 2541, 2545, 2572, 2542, 2549, 2571, 2544, 25...
## $ name        <chr> "Flame Lance", "Effigy", "Fallen Hero", "Arcane Bl...
## $ cost        <int> 5, 3, 2, 1, 3, 4, 3, 6, 8, 5, 4, 4, 1, 3, 2, 2, 4,...
## $ cardClass   <chr> "MAGE", "MAGE", "MAGE", "MAGE", "MAGE", "MAGE", "M...
## $ rarity      <chr> "COMMON", "RARE", "RARE", "EPIC", "RARE", "COMMON"...
## $ type        <chr> "SPELL", "SPELL", "MINION", "SPELL", "SPELL", "MIN...
## $ set         <chr> "TGT", "TGT", "TGT", "TGT", "TGT", "TGT", "TGT", "...
## $ collectible <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR...
## $ id          <chr> "AT_001", "AT_002", "AT_003", "AT_004", "AT_005", ...

The set column contains abbreviated names or nicknames and is not necessarily informative; we create a new column that uses the actual names of the card sets:

##  [1] "TGT"          "BOOMSDAY"     "BRM"          "GANGS"       
##  [5] "CORE"         "EXPERT1"      "HOF"          "NAXX"        
##  [9] "GILNEAS"      "GVG"          "HERO_SKINS"   "ICECROWN"    
## [13] "KARA"         "LOE"          "LOOTAPALOOZA" "OG"          
## [17] "UNGORO"

Some columns are entirely uppercase, which we convert to title case for readability:

The factor/enumerated columns are then identified and recast accordingly.

A summary of the processed data is shown below:

##      dbfId           name                cost          cardClass  
##  Min.   :    7   Length:1751        Min.   : 0.000   Neutral:657  
##  1st Qu.: 1987   Class :character   1st Qu.: 2.000   Paladin:123  
##  Median :38957   Mode  :character   Median : 4.000   Hunter :122  
##  Mean   :25375                      Mean   : 3.856   Mage   :122  
##  3rd Qu.:43163                      3rd Qu.: 5.000   Warlock:122  
##  Max.   :53187                      Max.   :20.000   Druid  :121  
##                                     NA's   :22       (Other):484  
##        rarity        type      collectible         id           
##  Common   :612   Hero  :  33   Mode:logical   Length:1751       
##  Epic     :298   Minion:1192   TRUE:1751      Class :character  
##  Free     :142   Spell : 471                  Mode  :character  
##  Legendary:253   Weapon:  55                                    
##  Rare     :446                                                  
##                                                                 
##                                                                 
##                          card_set  
##  Classic                     :236  
##  Basic                       :142  
##  Journey to Un'Goro          :135  
##  Knights of the Frozen Throne:135  
##  Kobolds & Catacombs         :135  
##  The Boomsday Project        :135  
##  (Other)                     :833

Mislabelled cards

As the decks are generated by human input, and there are multiple cards with the same name, it is recommended to check for cards that have the same name but wrong ID.

Specifically, we are looking for the version of each card that is collectible (since all cards used in Ranked decks must be collectible). This step requires loading the full card data (which contains non-collectible cards).

## [1] 15

To find the correct IDs, we join them by name to the cards_simple. The dbfID.x on the left would be replaced by the dbfID.y on the right:

dbfId.x name dbfId.y cost cardClass rarity type collectible id card_set
40341 Cleave 940 2 Warrior Free Spell TRUE CS2_114 Basic
2177 Dark Wispers 2009 6 Druid Epic Spell TRUE GVG_041 Goblins vs Gnomes
42146 Doppelgangster 40953 5 Neutral Rare Minion TRUE CFM_668 Mean Streets of Gadgetzan
38319 Druid of the Claw 692 5 Druid Common Minion TRUE EX1_165 Classic
2230 Druid of the Fang 2048 5 Druid Common Minion TRUE GVG_080 Goblins vs Gnomes
2310 Druid of the Flame 2292 3 Druid Common Minion TRUE BRM_010 Blackrock Mountain
40402 Evolve 38266 1 Shaman Rare Spell TRUE OG_027 Whispers of the Old Gods
41409 Jade Idol 40372 1 Druid Rare Spell TRUE CFM_602 Mean Streets of Gadgetzan
468 Mark of Nature 151 3 Druid Common Spell TRUE EX1_155 Classic
41609 Nefarian 2261 9 Neutral Legendary Minion TRUE BRM_030 Blackrock Mountain
38113 Raven Idol 13335 1 Druid Common Spell TRUE LOE_115 League of Explorers
1161 Starfall 86 5 Druid Rare Spell TRUE NEW1_007 Classic
38710 Unstable Portal 1929 2 Mage Rare Spell TRUE GVG_003 Goblins vs Gnomes
38653 Wisp 179 0 Neutral Common Minion TRUE CS2_231 Classic
137 Wrath 836 2 Druid Common Spell TRUE EX1_154 Classic

We create a named list that can be used within recode():

Other objects

The following items may be used for plotting:

Summary and Reflection

So far, we have looked at cards that users tend to include in decks in Standard format for Ranked play, which is also used for official Hearthstone tournaments - making these popular cards highly visible to a wide audience. We have also looked at card popularity when broken down by various categories, such as class, time period and card set.

Are there any limitations to the data that may have affected our analysis?

The major limitation of this data is that it only looks at decks submitted to a third-party website, which brings up the following issues:

  • Decks submitted may not necessarily be played in the game itself, either because other players think it is too weak, or because the user submits a joke deck that contains absurd combinations of cards and is not meant to be taken seriously.
  • There is no data on how often decks and cards are actually played in the Ranked format.
  • Likewise, there is no information on how effective the decks are at winning games. While the rating attribute may reflect how strong other players consider a deck, it is also biased towards the popularity of the user as well as the date of submission:
    • A deck may be initially strong and highly rated, but as new cards are introduced and old cards are removed from Standard format, the deck may wane in strength, but users are unlikely to retract their votes by this point in time.

How can we expand on this analysis?

  • Examine popular combination of cards that complement each other well.
  • Examine whether the crafting cost (in dust) of decks has any relation to its popularity (rating).